Skip to content

feat(scan): external CycloneDX SBOM ingest endpoint#406

Merged
haksungjang merged 7 commits into
mainfrom
feat/sbom-ingest-endpoint
Jun 14, 2026
Merged

feat(scan): external CycloneDX SBOM ingest endpoint#406
haksungjang merged 7 commits into
mainfrom
feat/sbom-ingest-endpoint

Conversation

@haksungjang

Copy link
Copy Markdown
Contributor

What

Adds POST /v1/projects/{project_id}/sbom-ingest so external tools (CI, cdxgen-based scanners) can upload an already-generated CycloneDX SBOM. TRUSCA runs the back half of the scan pipeline against it — persist components → trivy sbom matching → findings — reusing the Scan model so ingested scans get ref-keyed retention, the per-project active-scan guard, and the existing Components/Vulnerabilities/Licenses UI and build gate for free.

Builds on #404 (sbom scan kind) and #405 (shared pipeline helpers).

Not Dependency-Track compatible. This is a TRUSCA-native surface (Authorization: Bearer, field sbom, no autoCreate), not DT's POST /api/v1/bom + X-Api-Key. A tool currently posting to DT must add a TRUSCA uploader mode.

Contract

POST /v1/projects/{project_id}/sbom-ingest
Authorization: Bearer tos_…            # API key or JWT, developer role
multipart/form-data: sbom=@bom.cdx.json  ref=main  release=v1.2.3
→ 202 ScanPublic { id, kind:"sbom", status:"queued", celery_task_id }

Poll GET /v1/scans/{id} to completion (same as the GitHub Action flow).

Endpoint / service

  • Reuses trigger_scan's guards via an extracted prepare_scan_target (behavior-preserving): existence/team 404/403 before archived 409 / cap 429 — authz/existence always before state (CLAUDE.md §2 rule 1).
  • Synchronous adversarial validation of untrusted input: bounded read (SBOM_INGEST_MAX_BYTES, 32 MiB → 413), content-type/filename allow-list (415), JSON + CycloneDX structure whitelist (422), component cap (SBOM_INGEST_MAX_COMPONENTS, 50k → 422), and an O(n) string-aware byte nesting-depth pre-check so a deeply nested document is a clean 422 instead of RecursionError → 500 from json.loads. RFC 7807 throughout.
  • Atomic: flush wins the active-scan race before the file is written; a 409 loser writes no file; commit-race deletes the file; enqueue failure flips the row to failed → 503.

Celery task

ingest_sbom_task reuses persist_sbom_componentsrun_trivy_sbompersist_trivy_findingsmark_succeeded (ref-keyed supersede). Preserves the uploaded SBOM as a durable sbom_cyclonedx ScanArtifact (so the signature/bundle surface works) and containment-guards the path under workspace_root().

Filled: components, vulnerabilities (Trivy), declared licenses, dependency graph, build gate. Not filled (documented): scancode-detected / registry-concluded licenses, cosign signing, source preservation — these need a source tree.

Security — Producer-Reviewer

security-reviewer ran (CLAUDE.md §7): 0 Critical/High, 2 Medium, 2 Low, 1 Info. Addressed in this PR:

  • Mediumbind_audit_team(project.team_id) before the scan INSERT so the audit row carries team_id (was NULL, dropping ingest mutations out of team-scoped audit views).
  • Low — disk-write failure → 503 SbomIngestStorageError (retryable), not a misleading 422.
  • Inforelease / original_filename length-capped + control-byte stripped (parity with trigger_scan's mask_pii).

Deferred follow-ups (tracked, non-blocking):

  • Medium — multipart body is spooled to disk before the bounded read; this is parity with the existing source-archive endpoint, which already defers the edge cap to Traefik. Route a single maxRequestBodyBytes devops change covering both upload surfaces.
  • Low — adversarial component fields (oversized purl / control bytes) abort the scan task in the shared persist_sbom_components (contained: fails only the attacker's own scan). Harden the shared persist path to skip-and-log — affects the cdxgen pipeline too, so separate.
  • Error type URIs use docs.trustedoss.io (consistent with all 33 existing ones) — fold into the broader TRUSCA rebrand sweep.

Tests

  • Pure adversarial validator unit suite (versions, bomFormat, specVersion, component cap, content-type, bounded read, depth-bomb regression, metadata cleaning) — runs without DB/Redis.
  • Endpoint permission × state matrix (202, API-key scope 403, 404/409/413/415/422/429, shared rate-limit bucket, atomicity) + new existence-hide-state 409 rows for sbom-ingest.
  • Realistic multi-CVE fixture pipeline test (3 CVEs on one lodash version; components + declared licenses + findings persisted; succeeded + ref-keyed supersede).

Docs

EN + KO ci-integration/sbom-upload.md (ko-style lint 0 findings, Docusaurus builds clean): contract, curl, auth (Bearer not X-Api-Key), filled/not-filled, limits, RFC 7807 errors, DT-incompat caution.

Verification

  • mypy . (full): clean (447 files). ruff check: clean.
  • Validator unit suite passes standalone. Backend integration/E2E run under CI test (backend) (needs Postgres+Redis).

Follow-up (separate PR)

Frontend kind="sbom" badge label + EN/KO i18n.

Add POST /v1/projects/{id}/sbom-ingest so external tools (CI, cdxgen-based
scanners) can upload an already-generated CycloneDX SBOM; TRUSCA runs the
back half of the scan pipeline against it — persist components → trivy sbom
matching → findings — reusing the Scan model so ingested scans get ref-keyed
retention, the per-project active-scan guard, and the existing
Components/Vulnerabilities/Licenses UI and build gate for free.

This is NOT a Dependency-Track compatible surface: it is a TRUSCA-native
endpoint (Authorization: Bearer, field `sbom`, no autoCreate), not DT's
/api/v1/bom + X-Api-Key.

Endpoint / service (services/sbom_ingest_service.py, api/v1/sbom.py):
- multipart sbom + ref + release; 202 ScanPublic (kind="sbom").
- require_role_or_api_key("developer"); project-scoped key must match.
- Reuses trigger_scan's guards via an extracted prepare_scan_target
  (existence/team 404/403 before archived 409 / cap 429 — authz before state).
- Synchronous adversarial validation of untrusted input: bounded read
  (SBOM_INGEST_MAX_BYTES, 32 MiB → 413), content-type/filename allow-list
  (415), JSON + CycloneDX structure whitelist (422), component cap
  (SBOM_INGEST_MAX_COMPONENTS, 50k → 422), and an O(n) string-aware byte
  nesting-depth pre-check so a deeply nested document is a clean 422 instead
  of a RecursionError → 500 from json.loads. RFC 7807 throughout.
- Atomic: flush wins the active-scan race before the file is written; a 409
  loser writes no file; commit-race deletes the file; enqueue failure → 503.

Celery task (tasks/ingest_sbom.py, enqueue branch + include):
- ingest_sbom_task reuses persist_sbom_components → run_trivy_sbom →
  persist_trivy_findings → mark_succeeded (ref-keyed supersede). Preserves the
  uploaded SBOM as a durable sbom_cyclonedx ScanArtifact for the signature
  surface; containment-guards the path under workspace_root().

Security (Producer-Reviewer findings addressed):
- bind_audit_team before the scan INSERT so the audit row carries team_id.
- disk-write failure → 503 SbomIngestStorageError (retryable), not 422.
- release / original_filename length-capped + control-byte stripped.

Tests: pure adversarial validator unit suite (incl. depth-bomb regression),
endpoint permission×state matrix + new existence-hide-state 409 rows,
realistic multi-CVE fixture pipeline test. Docs: EN/KO ci-integration/sbom-upload.
The OpenAPI contract snapshot test (test_openapi_no_drift) flagged the new
POST /v1/projects/{project_id}/sbom-ingest path. Add it to the committed
snapshot — path param project_id only (sbom/ref/release are requestBody).
…e-pkg layer

image-scan (worker) HARD-failed on 3 node-pkg findings — lodash 4.17.19
(CVE-2021-23337, CVE-2026-4800) and minimist 1.2.5 (CVE-2021-44906) — that
live under @cyclonedx/cdxgen/node_modules. Reproduction in node:20-bookworm
shows cdxgen 11.x bundles both, while 12.3.3 AND 12.5.1 ship neither: a clean
build already lacks them, so the failure was a stale type=gha scope=worker
cache layer serving the pre-12.x install tree (same class as the earlier
php-symfony image-scan incident).

Bumping the version interpolated into the global npm install changes that
layer's cache key, forcing a fresh (clean) install — root-cause removal, not
a .trivyignore suppression (suppressing a package absent from a clean build
would wrongly mute a future regression). cdxgen invocation is unchanged across
12.3.3→12.5.1 and engines.node still allows ^20, so no scan regression. Fixes
main too (shared cache) once merged.
image-scan kept HARD-failing on lodash 4.17.19 (CVE-2021-23337, CVE-2026-4800)
and minimist 1.2.5 (CVE-2021-44906) even after the cdxgen 12.3.3→12.5.1 bump,
which only rebuilt the cdxgen layer. A fresh local install of cdxgen 12.5.1
and of npm 11.14.1 — the image's only two npm-package installers — pulls
neither package, and these CVEs were never in .trivyignore, yet image-scan
passed on #404/#405. The vulnerable copies therefore live in a stale, earlier
`scope=worker` cache layer (a non-deterministic npm-install resolution cached
long ago), not in anything the current Dockerfile produces.

Bumping the buildx GHA cache scope (worker → worker-v2) abandons the poisoned
cache and forces a single clean rebuild; the new namespace caches the clean
tree. Keeps the cdxgen 12.5.1 bump (latest 12.x, verified lodash/minimist-free).
@haksungjang haksungjang merged commit 5132078 into main Jun 14, 2026
23 of 24 checks passed
@haksungjang haksungjang deleted the feat/sbom-ingest-endpoint branch June 14, 2026 01:02
haksungjang added a commit that referenced this pull request Jun 14, 2026
…KO) (#408)

The external SBOM ingest endpoint (#406) creates scans with kind="sbom".
Surface it in the UI so ingested scans render a proper label instead of a
raw key:

- Promote ScanKind to the runtime mirror SCAN_KIND_VALUES (source, container,
  sbom), matching the backend's SCAN_KIND_VALUES tuple; derive the admin
  scan-kind filter KIND_OPTIONS from it so the two can't drift.
- Add EN/KO labels ("SBOM upload" / "SBOM 업로드") for the three dynamic
  kind→label maps: scans page badge, project-overview recent-scans badge,
  and the admin scans kind filter.
- Add a catalog-mirror contract test walking SCAN_KIND_VALUES against all
  three label maps in both locales (CLAUDE.md §2 rule 2) so a future kind
  can't ship a raw i18n key.

Badges carry no per-kind color, so sbom renders with the same outline style
as source/container — label only. i18n:check parity clean.
haksungjang added a commit that referenced this pull request Jun 14, 2026
…model 3) (#410)

Wires the conformance scorer (sbom_conformance, merged in #409) into the
existing CycloneDX ingest pipeline (#406) and exposes the verdict:

- models/sbom_conformance.py + alembic 0033: one sbom_conformance row per
  ingested scan (scan_id UNIQUE FK CASCADE, denormalised project_id), holding
  result (pass|warn|fail), n_fail/n_warn, component_count, PURL/license/hash
  coverage, and the per-check JSONB array. Forward-only.

- tasks/ingest_sbom.py: a 'conformance' stage (progress 20) scores the ORIGINAL
  uploaded bytes and persists the verdict before component persistence. Verdict
  is advisory — a 'fail' is recorded + surfaced but does NOT abort matching.
  Persist is delete-then-insert so a Celery acks_late re-entry replaces the row
  (uq_sbom_conformance_scan_id) — _reset_scan_for_rerun does not touch it.

- GET /v1/projects/{project_id}/scans/{scan_id}/conformance (api/v1/sbom.py) +
  SbomConformanceRead schema. Existence-hide 404 for outsiders; the
  (scan_id, project_id) predicate rejects cross-project reads. OpenAPI snapshot
  updated.

- Tests: pipeline asserts the verdict row (result/coverage/checks) + a forced
  re-entry REPLACES it (no dupe). API tests cover the happy read, cross-team
  404 (permission-before-state), missing-verdict 404, wrong-project 404.

- docs (EN/KO): a 'conformance verdict' section on the SBOM-upload guide —
  endpoint, pass/warn/fail meaning, thresholds, advisory (non-blocking) note.
haksungjang added a commit that referenced this pull request Jun 14, 2026
…del 3) (#411)

#406 ingest accepted CycloneDX-JSON only; the SPDX→CycloneDX converter landed in
#409 but was unwired. This enables SPDX end-to-end:

- sbom_ingest_service.validate_uploaded_sbom: format-dispatch validator. The
  O(n) byte-depth pre-check runs BEFORE any json.loads (incl. detect_format's)
  so a deeply-nested document is a clean 422, never a RecursionError → 500.
  CycloneDX keeps its existing structural gate; SPDX-JSON bounds its packages[]
  array (same cap as components[]); SPDX Tag-Value is bounded by the read cap.
  Content-type / filename allow-list gains SPDX media types + .spdx/.tag (NOT
  the over-broad text/plain). unknown / RDF / XML → 422.

- tasks/ingest_sbom._load_uploaded_sbom: maps the upload to a CycloneDX dict via
  sbom_convert.to_cyclonedx (CycloneDX passes through; SPDX JSON/TV is mapped)
  for persist_sbom_components. The ORIGINAL bytes stay on disk and are handed to
  Trivy, which auto-detects CycloneDX vs SPDX — no lossy round-trip for matching.

- Tests: validate_uploaded_sbom unit cases (real syft SPDX fixtures + adversarial
  RDF/XML/depth/cap, all local-runnable); pipeline test ingests a real SPDX-JSON
  → components persisted + conformance source_format='spdx-json'; API tests assert
  SPDX-JSON and SPDX Tag-Value uploads return 202 (the fake-bomFormat:SPDX 422
  case is unchanged — real SPDX uses spdxVersion).

- docs (EN/KO): the SBOM-upload guide now documents SPDX acceptance, the media
  type / filename allow-list, and SPDX RDF/XML being unsupported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant